Extending CCGbank with Quotes and Multi-modal CCG

نویسندگان

  • Daniel Tse
  • James R. Curran
چکیده

CCGbank is an automatic conversion of the Penn Treebank to Combinatory Categorial Grammar (CCG). We present two extensions to CCGbank which involve manipulating its derivation and category structure. We discuss approaches for the automatic re-insertion of removed quote symbols and evaluate their impact on the performance of the C&C CCG parser. We also analyse CCGbank to extract a multi-modal CCG lexicon, which will allow the removal of hardcoded language-specific constraints from the C&C parser, granting benefits to parsing speed and accuracy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Creating a CCGbank and a Wide-Coverage CCG Lexicon for German

We present an algorithm which creates a German CCGbank by translating the syntax graphs in the German Tiger corpus into CCG derivation trees. The resulting corpus contains 46,628 derivations, covering 95% of all complete sentences in Tiger. Lexicons extracted from this corpus contain correct lexical entries for 94% of all known tokens in unseen text.

متن کامل

Integrating Verb-Particle Constructions into CCG Parsing

Despite their prevalence in the English language, multiword expressions like verb-particle constructions (VPCs) are often poorly handled by NLP systems. This problem is partly due to inadequacies in existing corpora; the primary corpus for CCG-oriented work, CCGbank, does not account for VPCs at all, and is inconsistent in its handling of them. In this paper, we apply some corrective transforma...

متن کامل

CCGbank: A Corpus of CCG Derivations and Dependency Structures Extracted from the Penn Treebank

This article presents an algorithm for translating the Penn Treebank into a corpus of Combinatory Categorial Grammar (CCG) derivations augmented with local and long-range word–word dependencies. The resulting corpus,CCGbank,includes 99.4% of the sentences in the Penn Treebank. It is available from the Linguistic Data Consortium,and has been used to train widecoverage statistical parsers that ob...

متن کامل

Fully Lexicalising CCGbank with Hat Categories

We introduce an extension to CCG that allows form and function to be represented simultaneously, reducing the proliferation of modifier categories seen in standard CCG analyses. We can then remove the non-combinatory rules CCGbank uses to address this problem, producing a grammar that is fully lexicalised and far less ambiguous. There are intrinsic benefits to full lexicalisation, such as seman...

متن کامل

Parsing CCGbank with the Lambek Calculus

This paper will analyze CCGbank, a corpus of CCG derivations, for use with the Lambek calculus. We also present a Java implementation of the parsing algorithm for the Lambek calculus presented in Fowler (2009) and the results of experiments using that algorithm to parse the categories in CCGbank. We conclude that the Lambek calculus is computationally tractable for this task and provide insight...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007